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Abstract. 

In recent years, number of elderly people in population has been increased 
because of the rapid advancements in the medical field, which make it necessary to take 
cate of old people. Accidental fall incidents are life-threatening and can lead to the death 
of a person if first aid is not given to the injured person. Immediate response and medical 
assistance are necessary in case of accidental fall incidents to elderly people. The research 
community explored various fall detection systems to early detect fall incidents, however, 
still there exist numerous limitations of the systems such as using expensive sensors, 
wearable sensors that are hard to wear all the time, camera violates the privacy of person, 
and computational complexity. In order to address the above-mentioned limitations of 
the existing systems, we proposed a novel set of integrated features that consist of 
melcepstral coefficients, gammatone cepstral coefficients, and spectral skewness. We 
employed a decision tree for the classification performance of both binary problems and 
multi-class problems. We obtained an accuracy of 91.39%, precision of 96.19%, recall of 
91.81%, and F1-score of 93.95%. Moreover, we compared our method with existing 
state-of-the-art methods and the results of our method are higher than other methods. 
Experimental results demonstrate that our method is reliable for use in medical centers, 
nursing houses, old houses, and health care provisions. 

Keywords: Decision tree, fall incidents, Environmental Sounds, Machine Learning, Old 
houses. 


INTRODUCTION 

The elderly population of the world is growing at an alarming rate, since 2015, 
the population of the elderly, people aged 60 and above, was estimated to be at 901 
million and by 2030, it is expected to reach 1.4 billion people and possibly surpass 2.1 
billion by 2050 |1|. The elderly population is increasing, and with it comes a slew of new 
challenges. In |2], the majority of elderly individuals prefer to live alone and are concerned 
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about their privacy. In [3], the majority of Canadian elderly individuals live on their own. 
This trend is also being observed in European countries. Elderly people have a higher 
risk of falling since they can't easily control themselves due to muscle weaknesses. Falls 
have a serious impact on health and therefore can result in life-threatening injuries or 
even death. The most devastating problem after an accidental fall is that, in most 
situations, the patient is unable to ask for help due to unconsciousness [4], which further 
increases the chances of permanent injuries or even death. In this scenario, the elderly 
must be adequately cared for, and assistance must be provided immediately [5]. There 
have been several manual methods of taking care of the elder people, such as nursing at 
home but at times for a longer period, it is costly and not possible for most people to 
bear those expenses. The above discussion shows how accidental falls have a financial 
impact on the elderly population and cause significant health problems for elderly people 
who live alone. This alarming situation, and the large number of people that die as a result 
of falls, have motivated researchers to develop means of detecting falls and taking 
countermeasures after falls to avoid fatal consequences [6]. 

There are several sensor-based fall detection systems, which include wearable 
sensors, ambient acoustic sensor-based, and vision sensor-based. |{7| Wearable gadgets 
are worn on the body that detect any unusual activities. There are various wearable 
gadgets ie., accelerometer, gyroscope, and smartwatches. They have the following 
advantages: ease of use, high accuracy, low power consumption, and low weight, among 
others. However, elderly people tend to forget to wear their devices, or they are obtrusive 
enough that some people find them difficult to wear. The acoustic-based fall |8] detection 
system also known as ambient or contactless fall detection system consists of 
microphones, acoustic sensors, and floor acoustic sensors. A vision-based fall detection 
system uses surveillance cameras, mobile cameras, 3d, and 2d cameras to monitor a 
patient or fall vulnerable people [9, 10, 11]. The privacy of elderly individuals is a serious 
concern, when they are using vision-based sensors, their privacy is violated {13, 14}. 

There have been numerous studies on wearable fall detection systems. In |15], a 
wearable sensor placed around the elderly waist is used to detect falls through acceleration 
analysis. The motion sensor in this method was a triaxial accelerometer named 
ADXL345. The quaternion algorithm helps in the monitoring of patients’ daily activities 
and the detection of falls. With the use of a universal resource locator (URL), an alert 
message containing the patient's location was delivered to the respective caretaker once 
the fall event was detected. In {12], a new smart device was developed for detecting fall 
events and sending alert messages. 3D acceleration and gyroscope were used for 
developing this model. At first, activities of daily life (ADLs) and falls were differentiated. 
In the meantime, a smart device is developed by introducing the k nearest neighbor 
(KNN) algorithm and sliding window. This smart device is composed of smartphones 
and weatable motion sensors. The Wearable smart sensor consisted of a Bluetooth, 
gyroscope, and a triaxial accelerometer [16,17]. The sensor was attached to a vest worn 
by the elder person. Real-time angular velocity of ADLs and reluctant acceleration was 
being captured by a smart sensor. 

Bluetooth sends this stream data to a phone containing the KNN algorithm and 
sliding window to analyze data and detect fall events. Accuracy, sensitivity, and specificity 
achieved by the system are 97.9%, 94%, and 99% sequentially. In [18] a study was 
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performed in which wearable sensors were combined with a location sensor to develop 
a fall detection system. The developed system aimed to detect fall events in real-life cases 
and that situation in which fall detection was difficult to distinguish. The performance of 
context-based reasoning was improved noteworthy. This study concluded that it is better 
to use a combination of both types of sensors to achieve good results. Wearable sensors 
were combined with a location sensor to develop a fall detection system in a study 
published in {19|. The proposed method was designed to detect real-life fall events as 
well as situations where fall detection was difficult to distinguish. Context-based 
reasoning performance was significantly enhanced. To attain optimum performance, this 
study concluded that it is better to use a combination of both types of sensors. The above 
system performs well in terms of accuracy though accelerometer-based fall detection 
techniques have several drawbacks, such as sudden changes in acceleration in both the 
fall event and the activity of daily living (ADL) which in terms, makes it difficult to 
distinguish between the two incidents. 

The acoustic or ambient-based fall detection system includes a microphone that 
analyzes the surroundings and records human activities information from the 
environment. The key benefit of using an acoustic sensor-based method is that the elderly 
would not have to worry about wearing gadgets. Instead, it senses the environment and 
is passive, which will benefit the elderly because his privacy will be preserved [20]. In [21], 
an acoustic-FADE system was developed to keep track of the elderly and alert caregivers 
in the event of a fall. In this system, a circular microphone array is used to record the 
sound in the room. Based on the source location, the recorded sound is classified as fall 
ot non-fall. The sheered power response is employed to determine position, and the 
phase transform technique improves the sound's robustness {22|. In [23], a fall event 
detection system is designed that used MFCC from footsteps. A one-class support vector 
machine is employed for classification purposes. Similarly, in [24], a machine learning- 
based approach was developed to detect fall incidents. In [25], MFCC features were used 
to detect fall incidents. 

The proposed technique used SVM to discriminate between incidents involving 
falls and those involving non-falls. In [26] a supervised fall detection algorithm based on 
a witeless communication system Le. smartphone microphone in which falls were 
executed and subsequently recorded from different participants using a smartphone 
placed within a distance of 5m from them. This system probably does not work in the 
case when the volunteer is far from another area. After examining various features and 
supervised algorithms the author used spectrogram features as the input to an artificial 
neural network, thus arriving at an accuracy of 98%. Similarly, in [27], a microphone array 
has been employed in the premises of fall vulnerable people. The initial step in this 
method is to calculate the energy of the acquired signal. But in case the value goes beyond 
a threshold a sound localization technique is carried out to eliminate possible false alarms. 
Finally, the alarm is removed if the sound was noticed from a specific height. Then the 
author practiced the human falls by experimenting with a single stunt performed falling 
on a mattress. 

Our main contributions are as follows: 
e We proposed a novel set of features that capture maximum details from the audio 
signals. 
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e To validate our approach, we performed binary class and multi-class experiments 
to check the effectiveness of the proposed system. 
e We conducted extensive experimentations on the daily sounds dataset to validate 
the superiority of our method. 
The remaining paper is organized as follows, Section II discusses the proposed 
methodology, section III has experimental results and discussion while in Section IV, we 
conclude our work. 


PROPOSED METHODOLOGY 

The main purpose of the proposed system is to detect the fall incidents of 
humans and to detect environmental sounds. Our system comprises mainly of two stages 
ie., feature extraction and classification. Initially, we extracted three spectral features 
from the standard dataset available te., The Daily Sounds Dataset. After the feature 
extraction, we applied different machine learning classifiers such as Naive Bayes (NB), 
KNN, linear discriminant analysis (LDA), and decision tree (DT) to differentiate the fall 
incidents and non-fall incidents. However, DT outperformed all other machine learning 
classifiers. The illustration of the proposed working mechanism is given in Figure 1. 
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Figure 1. Proposed System. 
Dataset 


In this paper, we used a standard dataset for experimentation such as The Daily 
sounds dataset [28]. There is 1049 audio in this dataset. The daily sounds dataset has a 
total of 18 different classes, which are generated by the human’s actions such as 
breathing, dishes, door clapping, electrical shaver, glass breaking, hairdryer, keys, paper 
tear, female scream, water falling, yawn, sneeze, male scream, laugh, handclapping, female 
cry, door opening, and cough that are produced by humans and their actions. To avoid 
external interference, all the data is recorded during nighttime. We used 898 audio 
samples for training and 151 audio samples for evaluation purposes. The research 
community considered the fall and panic sounds as fall class while other environmental 
sounds as a non-fall class, so, we follow the same standard in this research work. 


Features Extraction 

Discriminative features extraction is necessary for an efficient and reliable fall 
incidents detection system. After extensive experimentations, we selected the three 
spectral features for this research work and selected these manually. We extracted three 
spectral features such as MFCC, GTCC, and spectral skewness from audio to design a 
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fall incident detection system. The detailed feature extraction mechanism is discussed in 
the subsequent sections. 


MFCC 

We extracted fourteen-dimensional MFCC features from the audio by using eq 
2. We extracted MFCC as follows: initially, we applied Fourier Transform on the 
logarithm of the power spectrum of a signal. We used a sampling rate same of 0.5 and 
audio signals have been used to generate Mel-frequency cepstrum. We achieved this in 
three steps, first, the entire spectrum is divided into windows, and this is done by using 
hamming windows technique. Secondly, the square of Mel-spectrum, which is also called 
power spectrum | /(w) |’, is calculated by the following equation 1 as given below 

x 


p(N,) = In [isco On Sia 1) 
k=1 


Finally, the acquired Mel frequency coefficient undergoes Discrete Cosine 
Transform (DCT) technique. We obtained the cepstral coefficients as output after 
employing DCT. In addition, log operation is involved in the non-linear rectification 
process which is performed before computing DCT. Therefore, Inverse Discrete Cosine 
Transform (IDCT) is performed by obtaining features which are given by the following 
equation. 

N-1 
mfccg = >. p (Nx) cos( 


k=0 


C=05 


le = 1,2,..,u (2) 


GTCC 

We computed fourteen-dimensional Gammatone Cepstral Coefficient (GIT'CC) 
features from the audio of fall and non-fall events. GITCC is a feature extraction 
technique used for ambulatory EEG signals. This technique is also known as 
Gammatone Frequency Cepstral Coefficient (GFCC). It has computation difficulty same 
as MFCC method but higher-level performance. The gammatone filter bank is applied 
on pre-processed audio signals and sub-band spectrum is obtained as output. Cubic root 
operation is applied on power spectrum in nonlinear rectification process followed by 
DCT to obtain GTCC spectral coefficients. We computed the GTCC features by using 
the following equation. 


2% mbj, 1 
Gamma, = -) the log; 9(S,) cos E (« — 5)| ,1<k <K (3) 
b=0 


In above equation Gamma, is the k* GTTC, S; is the energy in the b™ sub-band of the 
spectrum, Kis the GITC cepstral coefficient number and G is gammatone filter number. 


Spectral Skewness 

Symmetty around the centroid is measured by spectral skewness. Spectral tilt is 
the other name of spectral skewness in phonetics. Place of articulation is distinguished 
by using other spectral moments with spectral skewness. The relative strength of lower 
and higher harmonics is indicated with spectral skewness for harmonic signals. We 
computed the spectral skewness as follows. 
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In above equation [Myindicates spectral centroid, Mzis spectral spread, band edges 
indicated by b; and bz, Sxis bin k spectral value and f; is bin k frequency in Hertz. 


3 


Classification 

Machine learning classifiers are providing promising results for classification 
problems. In this paper, we focused on using different machine learning classifiers to 
detect fall incidents. We employed NB, KNN, LDA, and DT, however, the DT 
outperformed against all other machine learning classifiers. Figure 2 shows the structure 
of the DT algorithm. DT is a powerful classification algorithm used in numerous fields 
such as medical, image processing, audio processing, video processing, and other 
identification problems. DT is a sequential model, which unites a series of small test very 
effectively and cohesively based on the numeric features that is compared to a specific 
threshold value in each test. Constructing DT is easier than weights in neural networks 
of different nodes. DT comprises of nodes and branches, where each node represents a 
feature to be classified while each branch shows a different value that node takes it. There 
are two classes in our problem as fall and non-fall, and DT performs well on a binary 
classification problem. Therefore, we also employed DT to better detect fall incidents. 


Root Node 


Figure 2. Decision Tree. 


EXPERIMENTAL RESULTS AND DISCUSSION 

In this section, we have discussed the detailed experimental results and setup for 
experimentation purposes to detect fall and non-fall events. We evaluated the 
performance of the proposed system using accuracy, precision, recall, and F1-score. The 
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details of the dataset and experimental results are discussed in the subsequent section. 
Comparative analysis of machine learning algorithms 

The aim of this experiment is to detect fall incidents. We extracted 29-dim 
MECC, GTCC, and spectral skewness features from the audio samples of fall and non- 
fall incidents. We used 898 samples for training and 151 samples for evaluating the 
trained model. Moreover, we classified the panic and fall sounds of the dataset as fall 
incidents while the environmental sounds as non-fall incidents. We applied four different 
machine learning algorithms such as NB, KNN, LDA, and DT to check the effectiveness 
of machine learning algorithms in detecting fall incidents. From the results given in Table 
1, we can observe that (MFCC-GTCC-spectral skewness-LDA) performed worst and 
achieved an accuracy of 73.51%, precision of 84.76%, recall of 78.76%, and F1-score of 
81.65%. The (MFCC-GTCC-spectral skewness-KNN) performed the second-best in 
terms of accuracy and achieved an accuracy of 80.79%, precision of 100%, recall of 
78.35%, and Fl-score of 87.86% while the (MFCC-GTCC-spectral skewness-DT) 
performs the best and achieved an accuracy of 91.39%, precision of 96.19%, recall of 
91.81%, and Fl-score of 93.95%. The detailed results are given in Table I. From the 
results given in Table 1, we concluded that (MFCC-GTCC-spectral skewness-DT) 
captures maximum information from the audio of the fall and non-fall incidents. This 
system is reliable and effective to be used for detecting fall incidents. 


Table 1. Comparative Analysis of machine learning techniques. 
Method Accuracy% Precision” Recall% Fi score% 


NB Taal 100 72.41 84% 

KNN 80:79 100 78.35 87.86 
LDA Taal 84.76 78.76 81.65 
DT 91.39 96.19 91.81 93.95 


Confusion Matrix for Binary class 

The confusion matrix is designed to show correct and false prediction results for 
any classification problem. There are four different values such as false negative, false 
positive, true negative, and true positive values. We also designed a confusion matrix for 
DT as shown in Figure 3. We can observe that our method (MFCC-GTCC-spectral 
skewness-DT) misclassified 4 non-fall incidents as fall incidents while 9 fall incidents as 
non-fall incidents. Our method (MFCC-GTCC-spectral skewness-DT) correctly 
classified 101 fall incidents as fall while 37 non-fall incidents as non-fall incidents. From 
the confusion matrix as shown in Figure 3, we concluded that these show effective results 
to detect fall and non-fall incidents. 
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Figure 3. Confusion matrix (1) Fall and (2) non-fall event. 
Performance evaluation of Environmental Sounds Classification 


The aim of this experiment is to classify ten different classes of the daily sounds 
dataset. The ten classes are breathing, dishes, door clapping, electrical shaver, glass 
breaking, hairdryer, keys, paper tear, scream, and water failing. We observe that there is 
a very high correlation among these ten classes. We extracted 29-dim (MFCC-GTCC- 
spectral skewness) from the audio samples of these ten classes to train DT. We observe 
from Table 2 that (MFCC-GTCC-spectral skewness-DT) performed worst to detect the 
scream and achieved an accuracy of 88%, precision of 94.56%, recall of 99.10%, and F1- 
score of 96.20%. Our method performed second-best on a paper tear and achieved an 
accuracy of 98.23%, precision of 97%, recall of 95.21%, and F1-score of 97.88% while 
out method performed best to detect water failing audio, elegiacal shaver, and glass 
breaking with an accuracy of 100%, precision of 100%, recall of 100%, and F1-score of 
100%. The detailed results of all the classes are given in Table 2. We concluded that 
(MFCC-GTCC-spectral skewness-DT) is an effective and reliable system to detect the 
high correlated classes. 

Table 2. Performance of environmental sounds classification. 


Classes Accuracy% Precision’% —Recall% —_F1-Score% 
Breathing 94.12 97 96.2 98.11 
Dishes 92.78 98.33 93.89 98 
Door Clapping 92.09 oo 91.10 94.58 
Electrical Shaver 100 100 100 100 
Glass Breaking 100 100 100 100 
Hair Dryer 96.89 92.41 09 96.98 
Keys 96.17 92.56 09 05.09 
Paper Tear 98.23 97 9521 97.88 
Scream 88 94.56 99.10 96.20 
Water Falling 100 100 100 100 
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Confusion Matrix for environmental Sounds Classification 

The confusion matrix for the ten classes is shown in Figure. 4. The ten classes 
are breathing, dishes, door clapping, electrical shave, glass breaking, hairdryer, keys, paper 
tear, scream, and water falling, respectively. From the confusion matrix of multi-class 
scenarios, as shown in Fig. 4, we observe that our method misclassified only 20 incidents 
while correctly classifying 204 incidents. We know that these 10 classes are highly 
correlated, but our method is robust to detect all the ten classes correctly. We concluded 
that our method is capable to detect complex environmental sounds and can effectively 
be used for the detection of fall incidents. 
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Figure 4. Confusion matrix for Environmental Sounds Classification. 

Performance comparison with other methods 

The aim of this experiment is to compare the performance of the proposed 
system with the existing state-of-the-art methods. We compared our method based on 
accutacy, precision, recall, and Fl-score as shown in Table 3. We took direct results from 
theit papers without implementing them. We observed that khan et al. [30] performed 
worst and achieved an accuracy of 66%, precision of 64%, recall of 76%, and F1-score 
of 25% while Tuncer et al. [29| performed the second-best and achieved an accuracy of 
89.17%. Our proposed method performed best among the existing state-of-the-art 
methods and achieved an accuracy of 91.39%, precision of 96.19%, recall of 91.81%, and 
Fl-score of 93.95%. This comparative analysis of our method with existing methods 
demonstrates that our method performed superior and is effective to detect fall incidents. 
This system can be used in old houses, medical centers, and in the home where old age 
people are living alone to early detect the fall incidents to save lives and give first aid 
earlier. 

Table 3. Performance comparison with existing techniques. 


Authors Accuracy % Precision% Recall% _Fi1-score % 
Tuncer et al. [29] 89.17 - - - 
Khan et al. [30] 66 100 60 44 
Shaukat et al.[31] Ti 64 76 25 
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CONCLUSION 

This paper has presented a novel fall incidents detection system through the fused 
set of features comprised of (MFCC-GTCC-spectral skewness). We used a standard 
dataset such as the daily sounds dataset, which is available publicly. We extracted 29-dim 
features from the fall and non-fall incidents and employed the DT for classification 
purposes. Comparative analysis of our method with existing state-of-the-art methods 
demonstrates that our method is superior and effective to detect fall incidents. Our 
method has the lowest false alarm rate and achieved an accuracy of 91.39%. Moreover, 
the proposed method can be used in real-time environments such as in medical centers 
for monitoring of old people, nursing homes, and old houses. In the future, we aim to 
employ deep learning methods on the fused set of features as well as to send the exact 
location where the fall incidents happen to the caretakers. 
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